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ASYMPTOTICALLY  POINTWISE 
OPTIMAL  PROCEDURES 
IN  SEQUENTIAL  ANALYSIS 

PETER  J.  BICKEL 
and 

JOSEPH  A.  YAHAV 
University  of  California,  Berkeley 

1.  Introduction 

After  sequential  analysis  was  developed  by  Wald  in  the  forties  [5],  Arrow, 
Blackwell,  and  Girshick  [1]  considered  the  Bayes  problem  and  proved  the 
existence  of  Bayes  solutions.  The  difficulties  involved  in  computing  explicitly 
the  Bayes  solutions  led  Wald  [6]  to  introduce  asymptotic  sequential  analysis  in 
estimation.  Asymptotic  in  his  sense,  as  for  all  subsequent  authors,  refers  to  the 
limiting  behavior  of  the  optimal  solution  as  the  cost  of  observation  tends  to 
zero.  Chernoff  [2]  investigated  the  asymptotic  properties  of  sequential  testing. 
The  testing  theory  was  developed  further  by  Schwarz  [4]  and  generalized  by 
Kiefer  and  Sacks  [3].  This  paper  approaches  the  asymptotic  theory  from  a 
slightly  different  point  of  view.  We  introduce  the  concept  of  “asymptotic  point- 
wise  optimality,”  and  we  construct  procedures  that  are  “asymptotically  point- 
wise  optimal”  (A.P.O.)  for  certain  rates  of  convergence  [as  n— »oo]  of  the 
a  posteriori  risk.  The  rates  of  convergence  that  we  consider  apply  under  some 
regularity  conditions  to  statistical  testing  and  estimation  with  quadratic  loss. 

2.  Pointwise  optimality 

Let  {Yn,  n  >  1}  be  a  sequence  of  random  variables  defined  on  a  probability 
space  (Q,  F,  P )  where  Yn  is  Fn  measurable  and  F„  C  Fn+i  •  •  •  C  F  for  n  >  1. 
We  assume  the  following  two  conditions: 

(2.1)  P(Fn  >  0)  =  1, 

(2.2)  Yn  — >0,  a.s. 

Define 

(2.3)  Xn(c )  =  F„  +  nc  for  c  >  0. 
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Let  T  be  the  class  of  all  stopping  times  defined  on  the  c-fields  Fn.  We  say  that 
s  e  T  is  “pointwise  optimal”  if 

(2.4)  P  Ujjjj  <  l]  =  1  for  all  t  e  T. 

Unfortunately,  such  s’s  usually  do  not  exist  except  in  essentially  deterministic 
cases.  Let  us  consider  two  examples  of  such  situations: 

(2.5)  F„  =  L  »  >  V  >  0, 

(2.6)  =  U,  -oo  <  U  <  0. 

n 

In  these  examples  one  easily  sees  that  the  pointwise  optimal  rule  is  given  by 
the  following: 

V 

Example  1:  stop  as  soon  as  ^  <  c; 


Example  2:  stop  as  soon  as  enU  < 


c 

(1  -  <Pj 


These  examples  will  play  a  role  in  theorem  2.1.  In  nondeterministic  cases  one 
might  hope  that,  under  some  conditions,  we  can  get  A.P.O.  procedures.  Let  us 
define  these  more  formally. 

Abusing  our  notation,  in  a  fashion  long  used  in  large  sample  theory,  use  the 
words  “stopping  rule”  to  also  denote  a  function  from  (0,  »)  to  T,  say  /(•), 
c  G  (0,  qo),  t(c)  <=  T.  Now  in  analogy  to  our  previous  definition  we  say  s(-)  is 
A.P.O.  if  for  any  other  <(•), 

(2.7)  lim  sup  v>*(c)^-  <  1,  a.s. 

c— »o  At(C)fc; 

Consideration  of  the  deterministic  case  naturally  leads  us  to  hope  for  asymp¬ 
totically  pointwise  optimal  solutions  in  situations  where  the  rate  of  convergence 
of  Yn  stabilizes.  This  hope  is  fulfilled  in  the  following  theorem. 

Theorem  2.1.  (i)  If  condition  (2.1)  holds  and  nYn  — ►  V,  a.s.  where  V  is  a 

random  variable  such  that  P(V  >0)  =  1,  the  stopping  rule,  which  is  determined 
by  “stop  the  first  time  that  ( YJn )  <  c,”  is  A.P.O. 

(ii)  If  condition  (2.1)  holds  and  (log  Yn/n)  — >  U,  a.s.  where  U  is  a  random 
variable  and  P(U  <  0)  =  1,  then  the  rules  (ii)a,  “stop  the  first  time  Yn  <  c”  and 
(ii)b,  “stop  the  first  time  F„(l  —  Y\/n)  <  c”  are  A.P.O. 

Proof.  Let  Si(-)  be  the  stopping  time  defined  by  rule  (i).  Let  t(-)  be  any 
other  rule.  Then 


Xs, 

Xt 


YS1  +  csi 
Y  t  +  ct 


Ii? 

SlC 


+ 1 


< 


Jh  +  l  ~  Yj  +  i 

S\C  Si  SjC  Si 


(2.8) 
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It  suffices  to  show  that  lim  infc_>o  (Yt/sic)  +  t/si  >  2,  a.s.,  but  this  follows  upon 
remarking  that  ( x  +  1/x )  >  2  for  x  >  0  and  applying  the  following  lemma. 

Lemma  2.1.  If  c„  —»  0,  t(cn)/si(c„)  — >  x  >  0,  t(cn)  converges  (possibly  to  +») 
with  probability  1,  then  lim  infc„_>o  (F<(e„)/cnSi(c„))  — »  1/x,  a.s. 

Proof  of  lemma.  It  follows  from  the  assumptions  of  the  theorem  and  the 
definition  of  Si(c)  that 

(2.9)  P[limsi(c)  =  oo  ]  =  1. 

c— >0 


Suppose  first  that  P[lim<;Bf(cn)  <  »]  >  0.  On  this  set  liminfCB  Yt(Cn)  >  0,  and 
our  lemma  will  follow  in  this  case  if  we  show  that  csi(c)  — >  0,  a.s.  as  c  — >  0.  We 
in  fact  will  show  the  stronger 

(2.10)  cs?(c)  — >  V,  a.s. 

This  follows  immediately  from  the  inequalities 

(2.11)  —  <  c  < 

Si  (si  -  1) 

(2.12)  «i Yn  <  s?c  <  7^-rij5  («i  - 
and  (2.9). 

The  general  case  of  the  lemma,  on  the  set  [f  (c)  — >  oo  ]  is  a  consequence  of  the 
identity 


(2.13) 


Yj  =  si  tYj 

Sic  t  CSi 


and  our  assumptions. 

We  prove  case  (ii)a;  case  (ii)b  follows  similarly.  Let  s2(-)  be  the  rule  defined 
by  (ii)a,  t(-)  be  any  other  stopping  rule.  Again  we  have, 


(2.14) 


Xt 


Y 

— !  +  1 
s2c 


L  +  i 

S2C  s2 


and  s2 


a.s. 


But  then,  FSJ/s2c  <  l/s2  — *  0,  a.s.  In  an  analogous  fashion  to  lemma  2.1,  we  use 
lemma  2.2. 

Lemma  2.2.  If  cn  — >  0,  t(cn)  converges  a.s.  ( possibly  to  +<»),  f(cn)/s2(c„)  — > 
x  <  1,  then ,  Y t{Cn)/ csticn)  — >  co . 

Proof.  We  prove  first  that 


(2.15) 


s2(c) 
[log  c\ 


This  is  a  consequence  of  inequalities, 


1 


\Uf 


(2.16) 

(2.17) 


^  c  ^ 

log  Y82  <  logo  log  F(W-1)  (s2  -  1) 

s2  —  s2  (s2  —  1)  s 


a.s. 
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Now  suppose  t(cn) 

— >  oo ,  a.s.  Then, 

(2.18) 

log  yt  /log  Yt  s2  /log  s2  log  c\\ 

cs2  \  t  J  \  s2  t  s2  // 

Now, 

(2.19) 

lQg  Yt 
t 

a.s. 

(2.20) 

l°g  C  _.  £/ 

S2 

a.s. 

and 

(2.21) 

*(c»)  1  >  1 

t(cn )  X 

by  hypothesis  and  (2.15).  Since  U  <  0,  the  result  follows. 

The  theorem  is  proved. 

Corollary  2.1.  Let  N(c )  be  defined  as  any  solution  of  Xn(c)(c)  =  inf„  Xn(c). 
Then,  in  both  cases  of  theorem  2.1, 


(2.22) 


Tim  XjmWL 
c— » 0  Xn(c)(c) 


=  1. 


Proof.  Note  that  in  the  proof  of  the  theorem,  no  use  was  made  of  the  fact 
that  the  t(c)  is  a  stopping  time. 

Remark.  In  both  cases  it  may  readily  be  seen  that  st(c)  is  strictly  better 
than  t(c)  if  t(c)/s(c )  4>  1,  a.s.  However,  although  in  case  (i)  the  converse  holds, 
that  is,  t(c)  is  also  asymptotically  pointwise  optimal  if  s(c)/t(c)  — ■>  1,  a.s.,  this  is 
not  true  necessarily  in  case  (ii).  However,  as  the  existence  of  rule  (ii)b  indicates, 
here  too  there  are  many  A.P.O.  rules.  We  shall  see  more  in  the  conclusion. 


3.  Sequential  estimation  with  quadratic  loss 


The  main  theorem  of  this  section  states  that  for  the  one  parameter  exponential 
family  (Koopman-Darmois,  K-D),  Bayes  estimation  with  quadratic  loss  satisfies 
condition  (i)  of  theorem  2.1,  and  therefore  the  rule  given  in  theorem  2.1  (i)  is 
A.P.O.  This  result  can  in  fact  be  generalized  to  an  arbitrary  family  of  distribu¬ 
tions  under  some  regularity  conditions.  A  theorem  of  this  type  will  be  stated  at  the 
end  of  this  section.  We  give  the  proof  only  for  the  K-D  family  both  for  ease 
of  exposition  and  because  we  hope  to  weaken  the  regularity  conditions  of  our 
general  theorem.  Let  {Z,,  i  >  1}  be  a  sequence  of  independent  identically  dis¬ 
tributed  random  variables  having  density  function  fe(z)  =  eaWr(*)-6W  with 
respect  to  some  <r-finite  nondegenerate  measure  u  on  the  real  line  endowed 
with  the  Borel  <r-field  where  q(6)  and  Tiz)  are  real-valued. 

We  let  0,  the  parameter  space,  be  the  natural  range  of  0;  that  is, 


0  = 


I**  ev{9)T<-t)ii{dz) 


<  co 


} 


and  endow  it  also  with  the  Borel  c-field  and  the  usual  topology. 
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It  follows  that  0  is  an  interval,  finite  or  infinite,  and  we  assume  that 

(i)  q{6)  possesses  at  least  two  continuous  derivatives  in  the  interior  of  0  and 

(ii) ,  q'id)  0. 

It  is  known  that  under  these  conditions  the  following  propositions  hold: 

(A)  B,[T(Zd]  = 

(B)  If  q{6)  =  0,  then  6"(0)  =  var®  [T(Zi)]. 

(C)  Let  *(«,*)  =  log/, (2)  and  A(fi)  =  E,  [(--(^Z-)];  then  0  <  A(6)  = 
[g'(0)]2  var®  [T(Zi)]  <  »  ; 

(D)  For  0O  in  the  interior  of  0,  the  equation  £ - ^ — L  =  0 

t  =  l  60 

has  eventually  a  unique  solution  0„,  the  maximum  likelihood  estimate,  and 
0„  — *  0o,  a.s.  Pga  where  P®0  is  the  measure  induced  on  the  space  of  all  real  sequences 
{zi,  z2,  •  •  •}  by  the  density  /®0 (z). 

Let  v  be  a  probability  measure  on  0  which  has  a  continuous  bounded  density  'k 
with  respect  to  Lebesgue  measure  such  that  /  02Sk(0)  dd  <  <».  Consider  the 
problem  of  estimating  0  sequentially,  where  the  loss  on  taking  n  observations 
and  deciding  0  =  d  is  given  by  nc  +  (d  —  0O)2  when  0O  is  the  true  value  of  the 
parameter.  The  overall  risk,  72(0,  t),  for  a  sequential  procedure  consisting  of  a 
stopping  rule  t  and  estimator  6(Zh  •  •  •  ,  Zt)  is  then  given  by, 

(3.1)  72(0",  t)  =  cE{t)  +  B[(!KZi,  •  •  •  ,  Zt)  -  0)2]. 

It  follows  from  the  results  of  Arrow,  Blackwell,  and  Girshick  that  whatever 
be  the  choice  of  t,  the  optimal  estimate  given  t  is  the  conditional  expectation 
of  0  given  the  past,  E[d\Zh  •  •  •  ,  Zt]  —  h-  Hence,  finding  optimal  procedures 
for  the  sequential  problem  is  equivalent  to  constructing  optimal  stopping  rules 
for  the  sequence  {X„}  where  Xn  =  Yn  +  nc  and 

(3.2)  F„  =  E[(6  -  eBy\Zh  •••,£„]  =  var  (0| Zu  •  •  •  ,  Zn). 

In  order  to  find  an  A.P.O.  rule  by  the  method  of  theorem  2.1  (i),  we  have  to 
show  that  P(F„  >  0)  =  1  and  nYn  — >  V,  a.s.  where  P(V  >  0)  =  1. 

Theorem  3.1.  For  the  K-D  family  obeying  assumptions  (i)  and  (ii),  we  have 
P(F„  >  0)  =  1  and 

(3.3)  nY  n — *  1/A(0). 

Proof.  Since  the  a  posteriori  density  exists  with  probability  one,  the  condi¬ 
tional  variance  of  0  is  positive  with  probability  one. 

To  show  (3.3)  we  will  establish 

(3.4)  Pg0{nE[(d  -  0n)2| Zh  •••,£.]-»  1  M(0O)}  =  1 
and 

(3.5) 


nll2(E[6\Zi,  •••  ,F»]  —  0n)  — ►  0 
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with  probability  one,  where  0„  is  the  maximum  likelihood  esitmate  of  0.  The 
theorem  readily  follows  from  (3.4),  (3.5),  and  the  identity, 

(3.6)  var  (d\Zh  •••,£„)  =  E[(0  -  9n)2\Zh  •  ■  •  ,  Zn] 

—  \E0\Z\,  •  *  *  ,  Zn)  0n]2- 

Let  us  define,  ^f*(t\Zi,  •  •  •  ,  Zn )  to  be  the  a  posteriori  density  of  w1/2(0  —  0„). 
Thus, 

(3.7)  •  •  • ,  Z„)  =  exp  -ft  4.  («„  +  Z.)| 

•4'  +  «»)  j  exp -fs  4(s,  Zi)j- 4-(s)  dsj  • 

Equations  (3.4)  and  (3.5)  follow  easily  from 

(3.8)  Peo  [/*"  |«|«|¥*(i|Zi,  •••,£„)-  VZ(0b)0(iVZ(^))|  ->  O]  =  1 

for  f  =  1,  2,  where  <£(#)  is  the  standard  normal  density. 

Define  the  random  quantity 

(3.9)  vn(t )  =  exp  $  [^0„  +  ^=Z^  -  <f>0n,  Z<)JJ~ 

To  prove  (3.8),  it  suffices  to  show 

(3.10)  J  |<|V»(0  -  ^  <t>(VA(d0)  01*  (tin  +  dt  ->  0,  a.s.  Pe 
for  i  =  0,  1,  2.  To  see  this,  note  that  by  the  case  i  =  0  we  would  have 

(3.11)  J_ *  v„(0*  dt 


-  J  </>(VA(0 o)  0*  (dn  + 


dt  — »  0. 


Now  by  the  dominated  convergence  theorem,  the  boundedness  and  continuity 
of  and  the  consistency  of  0n,  we  have 


(3.12) 
Let 

(3.13) 
Then, 

(3.14) 


J-«  \  V«/  vA(0o) 

-  /-.  exp{s(* +  Z)  -  ^ Zd)}*&n  +  *0  dS ■ 


**(t\ZU  •  •  •  ,  Zn)  = 


vnm0n  +  t/Vn) 


Cn 


and  since  4r*  is  a  probability  density,  we  have 
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(3.15) 
Therefore, 

(3.16) 


c*  -  v"m  (9*  +  v~) 


dt. 


and  the  sufficiency  of  (3.10)  for  our  result  is  clear. 

Write  (3.10)  as, 

(3.17)  f  ¥  ($„  +  4-)  !<|‘M0  -  ^  f)|  dt 

y|t|<5*Vn  \  Vffl/ 


+ 


f 

J\t\  >S*Vn 


k  + 


Vw 


V2~^{VA%)  t) |  dt. 


We  first  establish  the  following  lemma. 

Lemma  3.1.  Under  the  above  conditions , 

(3.18)  An  =  [  ^(dn  +  -^)\t\iUn(t)dt-^0,  a.s.  Pg9, 

J\t\>6*'/n  \  VW/ 


for  i  =  0,  1,  2,  and  all  8*  >  0. 

Proof.  We  change  variables  to  y  =  -4=-  Then, 

Vn 

i+1  f  n  „ 

(3.19)  An  =  n  2  /  |i/|i  exp  £  {<3%  +  0n,  Z,)  -  $(0«,  ^,)}^(0n  +  2/)  <&/. 

yij/l>5*  i=i 

Define, 

(3.20)  HM  =  [j(«.  +  V)  -  «(«„)]  l  t  T(Z ,)  -  [b(L  +  y)  -  &(«.)]. 

n  i=i 

Then,  in  our  case,  (3.14)  reduces  to 

(3.21)  An  =  n2~  f  \y\i  exp  {nHn(y)}V(y  +  k)  dy. 

J\y\ 

By  ( D ),  for  n  sufficiently  large  the  equation  H'n(y)  =  0  has  a  unique  solution 
given  by  y  —  0,  and  moreover,  0  is  then  the  unique  local  maximum  of  Hn. 
Therefore  we  may  conclude  that 

(3.22)  sup  Hn(y )  =  max  ( Hn{8 *),  Hn{—8*))  <  —  M  <  0, 

[i/I  >S* 

eventually.  Therefore, 

(3.23)  An  <  n~  exp  (  —  Mn )  f  *  |y|%(y  +  k)  dy 0,  a.s.  Pe , 

since  k  are  bounded  a.s.,  which  proves  the  lemma. 

From  lemma  3.1,  the  boundedness  of  ¥,  and  the  well-known  properties  of  the 
normal  distribution,  it  follows  that 
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(3.24) 


WM)  - 


a.s.  Pe0- 


We  finish  the  theorem  with  lemma  3.2. 

Lemma  3.2.  Under  the  above  conditions,  there  exists  a  8*  >  0  such  that 


(3.25)  Bn  =  fw<s¥V-  | vn{t)  -  V2^  4>(VA(6 0)  01  *  ->  0,  a.s.  P0O. 

Proof.  We  expand  log  vn(t )  formally  to  get 


(3.26)  log  v.«  =  t  {f  (A»  Z.)  ^  ^  «tt  ZO.  ZO)  g 

where  0£(£,  Z ,)  lies  between  8n  and  0„  +  £/Vn. 

Of  course,  \  8$/880n,  Zi)  =  0  whenever  this  expression  is  valid.  In  our 
case,  (3.26)  is  valid  and  simplifies  to 


(3.27) 


log «-.(()  =  £{«"(«(())££  T(Z,)  -  &"(«(())}• 


Choose  €  >  0  so  that  3e  <  A  (do).  Then,  by  the  continuity  of  q ",  there  exists  a 
8*(e)  so  that 

(3.28)  l«"W  -  «"(«.)!  <  ${g| 

and  | b"(s)  —  b"(d0)\  <  e,  for  \s  —  0oj  <  5*(c). 

On  the  other  hand,  with  probability  one  for  n  sufficiently  large, 


(3.29) 


bf(80 ) 
q'(e 0) 


*W(.Oo)\ 

2\b'(90)\’ 


and  therefore,  for  such  n,  |£/Vn|  <  5*(e),  we  have 


(3.30) 
But, 

(3.31) 


log  vn(t)  -  | 


b'(8 0) 
q’iB 0) 


<  3e. 


ff"(*)  ^  -  b\e o)  =  -A(8 o). 


Equality  (3.31)  follows  by  double  differentiation  of  the  identity 


(3.32)  e*WTV-b«»n(dz)  =  1 

and  (C). 

Therefore,  vn(t)  <  exp  {(3e  —  A(0o))(£2/2)}  for  n  sufficiently  large,  independ¬ 
ent  of  t.  But  vn(t)  —  V^ir  <£>( V A ( do )  t)  — >  0  for  each  fixed  £  by  (3.27)  and  (3.31). 
Applying  the  dominated  convergence  theorem,  the  lemma  follows. 

The  theorem  is  now  an  immediate  consequence  since  V  is  bounded.  For  ref¬ 
erence  we  now  consider  the  general  model  and  state  theorem  3.2.  Let  0  be  an 
open  subset  of  the  line.  Let  {Zit  i  >  1}  be  distributed  according  to  fe(x),  a 
density  with  respect  to  a  <r-finite  measure  p  for  d  e  0.  Let  ^  be  a  probability 
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density  on  0  with  respect  to  Lebesgue  measure  satisfying  the  conditions  of  this 
section.  We  define  3>(0,  x)  =  log  fe(x)  as  before.  We  then  have  the  following 
theorem. 

Theorem  3.2.  If 


523> 

(1)  (0,  x )  exists  and  is  continuous  for  almost  all  x; 

00 


(2)  Ee(  sup  [<£(s,  Zf)  —  $(0,  Z i)])  <  0  for  almost  all  8  and  all  e  >  0; 

|»  — »|>€ 

(3)  Ee(  sup  \~  (s,  Zi)\  )  <  oc  for  some  e  >  0,  for  almost  all  0; 

\|*>-0|<«  Lw  J  / 


(5)  maximum  likelihood  estimates  {0„}  of  8  exist  and  are  consistent; 

(6)  ^  satisfies  the  condition  of  this  section,  is  continuous,  hounded,  and 
f  02*(0)  dd  <  oo  ; 

i  /m  i2\ 

then  nYn—>~ a.s.,  where  A  (8)  =  —  (0,  Zf)  \  J — the  Fisher  information 

number,  and  Yn  =  var  (6\Zh  •  •  •  ,  Zn ). 


4.  Sequential  testing 

The  main  theorem  of  this  section  states  that  for  the  one  parameter  K-D  family 
sequential  Bayesian  testing  satisfies  condition  (ii)  of  theorem  2.1,  and  therefore 
the  rules  given  in  theorem  2.1  (ii)a,  (ii)b  are  A.P.O. 

Again  we  shall  state  a  more  general  theorem  at  the  end  of  the  section  whose 
proof  will  appear  elsewhere. 

Without  loss  of  generality,  we  assume  { Z ,,  i  >  1}  to  be  distributed  according 
to  the  density  fg(x)  =  (fT^)-b{e)  w^h  respect  to  some  nondegenerate  <r-finite 
measure  /i.  Let  n  be  as  before  and  let  v  be  a  probability  measure  on  0  such  that 
v  assigns  positive  probability  to  any  nonempty  open  subset  of  0. 

As  is  customary  in  the  testing  problem,  we  have  a  decomposition  of  0  into 
two  disjoint  Borel  sets  H  and  H  {H  complement),  H  being  the  hypothesis.  We 
have  a  choice  of  two  decisions  (accepting  or  rejecting  H );  we  pay  no  penalty 
for  the  right  decision  and  incur  a  measurable  loss  1(8)  >  0  when  0  is  the  true 
parameter  and  we  make  the  wrong  decision.  We  assume  that  f  I(8)v(d8)  <  °°. 
In  addition,  as  usual,  we  pay  c  >  0  for  each  observation.  The  overall  risk  R(<j>,  t) 
for  a  sequential  procedure  consisting  of  a  stopping  rule  t  and  randomized  test 
<t>(Zi,  •  •  •  ,  Zt)  is  then  given  by 

(4.1)  R(<t>,  t)  =  cE(t)  +  E[<f>(Zi,  •  •  •  ,  Zt)min(8)] 

+  E[(  1  -  0(Zlf  •  •  •  ,  Zt)X(0)/i7(0)] 

where  Ia(8)  is  1  if  0  e  A,  and  0  otherwise.  Again  by  [1],  we  can  separate  the 
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final  decision  problem  from  the  stopping  problem;  that  is,  there  is  an  obvious 
optimal  choice  of  <j>  given  t.  We  may  now  write  X„  =  Yn  +  nc  where 

(4.2)  Yn  =  MZi,  •  •  • ,  zn)E[mihi(e)  -  /»(*)]  |Z„  •  •  •  ,  ZJ 

+  E\1{6)I’b{6)\Zi,  •  •  •  ,  Z„], 

and  <f>B  is  the  Bayes  test  given  n  observations.  Now  again  the  problem  is  to  find 
optimal  stopping  rules  for  the  process  Xn.  We  will  establish  under  some  regularity 
conditions  on  v  that  log  YJn  — »  U  and  P(U  <  0)  =  1.  Let  v*  be  the  measure 
defined  by  v*(A)  =  Ja  l(d)v(dd).  We  have  the  following  theorem. 

Theorem  4.1.  Assume ,  in  addition  to  the  conditions  given  beforehand  in  this 
section: 

(1)  0  <  v*(H)  <  v*(Q), 

(2)  v(fl)  =  0  where  ft  is  the  boundary  of  H,  and  v(U)  >  0  for  all  open  U,  and 

(3)  t{6)  is  strictly  bounded  away  from  zero  outside  of  some  compact  K.  Then, 

(4.3)  -°g  Y*  _>  („*)  ess  Sup  j(g}  e0)IH(do)  +  (v*)  ess  sup  J(9,  0o)/i/(0o)  =  B(d0) 

n  agh 

where 

(4.4)  J(e,  do)  =  Ee MB,  Zx)  -  Zff), 
and  Ia(9)  is  the  indicator  function  of  A. 

Proof.  It  is  well  known  that  J(6 ,  60)  <  0  if  Pe  7^  Pe »•  This  observation  and 
the  following  lemma  will  establish  that  P[B(d0 )  <  0]  =  1  in  our  case. 

Lemma  4.1.  In  a  K-D  family  as  above,  J(9,  60)  is  concave  in  d  with  a  unique 
maximum  of  0  at  d  =  do. 

Proof.  According  to  condition  (A),  J(d,  do)  =  (d  —  do)b'(d0)  +  b(d0)  —  b(d). 
Further,  J'(d0,  d0)  =  0  and  J"(d,  d0)  =  —b"(d)  <  0  by  (B).  The  lemma  follows. 

To  prove  convergence  of  log  YJn,  it  evidently  suffices  to  consider  Y\/n  which 
is  given  by 

(4.5)  Yl/n  =  1  / [/e  exp  nQn{d)v{dd)~^'n  min  jR  1(d)  exp  nQn(d)v(dd)y/n, 

(fjjW  exp  nQn(d)v(dd)yin^ 

where  Qn(d)  =  [1  /n  T(Zx)]d  —  b(d).  Let  Q(d,  do)  =  b'(d0)  —  b(d). 

Lemma  4.2.  Let  {Wn}  be  a  sequence  of  essentially  bounded  random  variables 
such  that  ess  sup  \Wn  —  W\  — >  0.  Then  Elln\Wn\n  — >  ess  sup  W. 

Proof.  By  Minkowski’s  inequality, 

(4.6)  E1/n\W\n  -  E1/n\Wn  -  W\n  <  E1/n\Wn\n  <  El/n\Wn\n  +  El/n\Wn  -  W\\ 

Since  Elln\Wn  —  W \n  <  ess  sup  \Wn  —  W |,  the  lemma  follows  from  the  con¬ 
vergence  of  the  L„  norm  to  the  Lx  norm.  Q.E.D. 

To  establish  the  theorem,  it  suffices  to  show  that  if  v*(B)  >  0, 

(4.7)  | fB  [exp  Qn(d)]H(d)v(dd)y/n  ->  (v*)  ess^sup  [exp  Q(d,  0O)], 
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and  in  particular, 


(4.8)  | JQ  [exp  nQn(d)]v(dQ)\'l/n  — >  (v)  ess  sup  [exp  Q(d,  0O)3 

since  then,  a.s.  Pe0,  Y\/n  converges  to 


(4.9) 


min  {( v *)  ess  sup  exp  Q(0,  0O),  (v*)  ess  sup  exp  Q(d,  0O)} 
_ e  eh _ 6EH _ 

(v*)  ess  sup  exp  Q(d,  0O) 
eee 


Now  (v)  ess  sup  Q(6,  0O)  =  Q(d o,  0o)  by  lemma  4.1  and  condition  (2),  and  J(d,  do)  = 
Q(d,  do)  —  Q(90,  do)-  We  prove  (4.7).  By  lemma  (4.1)  and  condition  (3),  there 
exists  a  compact  K  such  that 


(4.10) 

and 


(v*)  ess  sup  Q(d,  d0)  =  ess  sup  Q(9,  d0) 

6EKC\B  beb 


(4.11)  ( v *)  ess^up  Q(d,  d0)  <  ess  sup  Q(d ,  do). 

6EKOB  BEB 

Now  clearly  v*(K  B)  >  0.  Remark  first  that 

(4.12)  (?*)  ess  sup  exp  Qn(d)  ^  [/.  exp  nQn(d)v*(dd) J1/n 

>  \_{KnB  exP  nQn(d)v*(dd)]Un' 

But  by  lemma  4.2, 


(4.13)  lim  \_fKnB  exp  nQn{d)v*(dB)~^'n 

=  lim  [l /v*(K  H  B)  •  JKnB  exp  nQn{d)v*(dd)~^'n 


=  (v*)  ess  sup  exp  Q(d,  d0), 

BGKnB 


since  Qn(d)  —>  Q(d,  d0),  a.s.  PgQ  uniformly  on  K  by  the  S.L.L.N.  On  the  other 
hand,  by  lemma  4.1,  (4.11),  and  the  S.L.L.N., 


(4.14)  (v*)  ess  sup  exp  Quid)  — » ( v *)  ess  sup  exp  Q{d,  do) ; 

eea  beb 

which  completes  the  proof  of  the  theorem. 

As  in  section  3,  we  again  state  a  general  theorem  without  proof.  Let  {Ziy  i  >  1} 
be  distributed  according  to  a  density  fa  (x)  with  respect  to  a  cr-finite  measure  n  for 
d  e  0  C  Rp  for  some  p,  0-Borel  measurable.  Let  H  be  a  measurable  subset  of  0. 
Then  let  v ,  l{d),  v*,  F„,  J ( d ,  do),  B(d),  4>(0,  x),  H  be  defined  as  before.  We  have 
the  following  theorem. 

Theorem  4.2.  Suppose  that 

(1)  v(U)  >  0  for  any  open  set  U,  0  <  v*{H)  <  v*(0); 

(2)  »*(£)  =  0; 

(3)  1(d)  >  0  and  is  strictly  positive  outside  a  compact; 
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(4)  <£(0,  x)  is  continuous  in  0  for  almost  all  x; 

(5)  #*{supi|.— *ii<a(«,)|$(s,  Zi)  -  $(t,  Zi)|}  <  oo  for  some  A(0O)  >  0,  and  for 
all  0o ; 

(6)  jEJo0[4>(s,  Zi)]  >  —oo  for  all  s; 

(7)  E8a{sup\\8-eo\\>K(eo)  [<t>(0,  Zx)  —  $(0O,  Zx)]}  <  B(d0)  for  some  K(0O)  <  oo. 
Then,  log  F„/n  — >  B(0O),  a.s.  Po0. 

This  theorem  of  course  covers  the  multivariate  as  well  as  univariate  K-D 
families  and  also  many  other  examples.  The  reader  will  also  note  the  by  no 
means  accidental  resemblance  of  our  conditions  to  those  of  Kiefer  and  Sacks 
in  [3].  Of  course,  the  conditions  required  to  prove  pointwise  optimality  are  less 
stringent. 

6.  Conclusion 

Some  of  the  procedures  suggested  in  this  paper  and  similar  A.P.O.  procedures 
for  estimation  and  testing  have  already  appeared  in  the  literature.  Thus,  Wald 
[6]  proved  that  under  some  regularity  conditions,  similar  to  those  of  theorem  3.2, 
the  following  procedure  is  asymptotically  minimax:  “Stop  the  first  time 
(1  /(ft  +  l)A(0n))  <  c”  where  0n  is  the  maximum  likelihood  estimate  and  A (6) 
is  as  before.  It  is  not  difficult  to  verify  that  this  procedure  is  asymptotically 
equivalent  to  the  rule  given  in  theorem  2.1  (i)  since,  under  the  conditions  of 
theorem  3.2, 

(6.1) 

Schwarz  [4]  showed  the  procedure  of  theorem  2.1  (ii)a  to  have  asymptotically 
the  same  shape  as  the  optimal  Bayes  region  for  the  exponential  family  under 
essentially  the  conditions  of  theorem  4.1.  Kiefer  and  Sacks  [3]  extended  his 
results  to  more  general  families  and  strengthened  them.  They  proved,  under 
some  regularity  conditions,  that,  in  the  presence  of  an  indifference  region  be¬ 
tween  hypothesis  and  alternative,  the  procedure  “Stop  when  Yn  is  first  <  c” 
is  asymptotically  Bayes.  It  may  be  shown  from  their  results  that  the  Bayes 
optimal  rule  is  A.P.O.  as  might  be  expected. 

The  procedure  given  in  theorem  2.1  (ii)b  seems  to  be  “better”  if  an  indifference 
region  is  not  assumed.  We  are,  at  present,  investigating  the  connection  between 
A.P.O.  rules  and  asymptotic  Bayes  solutions  in  this  and  other  instances.  The 
results  of  [3],  [4],  [6]  give  the  reader  some  idea  of  what  can  be  expected.  It 
may  be  noted  that  the  rules  of  Wald  and  one  of  the  asymptotically  Bayes  rules 
proposed  by  Kiefer  and  Sacks  which  is  A.P.O.  are  essentially  independent  of  the 
choice  of  prior  distribution.  In  general  (because  of  dependence  on  large  samples), 
the  concept  of  asymptotic  pointwise  optimality  seems  to  be  “prior  distribution 
free”  a  property  which  augurs  well  for  its  application  to  non-Bayesian  and  even 
nonparametric  statistics.  We  hope  to  explore  these  questions  also  in  subsequent 
papers. 
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Finally,  it  seems  that  the  results  of  this  paper  are  answers  to  interesting 
examples  of  a  more  general  question.  Suppose  we  are  given  a  stochastic  sequence 
of  processes,  {Xc(0}  consisting  of  a  deterministic  component  {Dc(t)}  and  a  noise 
component  {Nc(t)}.  Let  d(c )  denote  the  time  at  which  {Dc(t)}  reaches  its 
minimum,  and  o(c)  denote  the  time  at  which  (Xc(£)}  reaches  its  minimum  and 
suppose  that  o(c)  — >  °o  as  c  — *  0.  Assume  further  that  we  can  estimate  Dc(t) 
consistently  from  Xc(t )  by  Dc(t),  where  consistency  refers  to  the  behavior  of  D 
as  c  — » 0.  Let  die)  denote  the  approximation  to  d(c)  based  on  Dc(t).  When  is  it 
true  that  Dc(d(c ))  ~  Dc(d(c))  ~  Xc(o(c))?  Obviously,  Nc(t)  — *0  as  c— >0,  but 
further  investigation  is  required.  We  intend  to  deal  with  this  question  in  a 
forthcoming  paper. 

We  would  like  to  thank  A.  Dvoretzky  for  a  remark  which  led  to  corollary  2.1. 
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